METHODS AND APPARATUS FOR REPLACING COOLING 
SYSTEMS IN OPERATING COMPUTERS 



Technical Field 

5 [0001] This invention relates to cooling computers and also to 
electronic devices generally which require cooling for continuous 
operation. Particular embodiments of the invention relate to maintaining 
computer cooling systems. Some specific embodiments of the iuvention 
permit the replacement of cooling fans in operating computers. 

10 

Background 

[0002] Computer data processing chips such as CPUs (central 
processing units), GPUs (graphics processing units) and the like are 
becoming increasingly powerful. This increase in performance has been 
15 accomplished by increasing clock fi-equencies, shrinking geometries within 
integrated circuits, and adding additional logic for more features. 

[0003] Current high performance data processing chips generate 
significant amounts of heat. For example, some state of the art CPUs 

20 generate heat in excess of 80 watts. Since excessive temperatures can 

damage integrated circuits, it is common to provide active cooling systems 
to CPUs and similar devices. For example, it is common to attach large 
heat sinks to CPU chips and to provide a fan to ensure that there is 
adequate cooling air flow through the heat sink at all times while the 

25 computer is operating. If the air flow is interrupted for as Uttle as a minute 
or two, the CPU can be destroyed by excessive heat buildup. 
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[0004] The fan may be mounted directly on the CPU heat sink to 
push air past the fins of the heat sink. The fan may alternatively be 
mounted elsewhere in the computer or on the surface of the computer's 
case. The fan is typically mounted in such a way that its air flow is 
5 directed to the vicinity of the CPU. 

[0005] Like any other devices with moving mechanical parts, cooling 
fans can fail. If the cooling fan fails, air flow is interrupted. As a result, 
heat builds up in the CPU and the CPU's temperature can rise quickly to 

10 critical levels. Many modem computers prevent destruction of the CPU in 
such an eventuality by providing a system for monitoring the die 
temperature in the CPU. If the temperature of the die increases beyond a 
threshold temperature, the CPU is shut down. Shutdown of the CPU 
typically occurs very abmptly with no warning to software. The CPU 

15 essentially crashes. After the computer is restarted, it is necessary to retum 
the CPU to an appropriate state and/or clean up any corrupted data 
resulting from the CPU crash before the computer can resume its intended 
role. The computer could be out of service for a significant period of time 
before a fan failure is detected and corrected. 

20 

[0006] In recent years, cooling fans have been improved such that 
incipient failures can be detected. Many cooling fans have voltage sensors 
and fan speed sensors. If the fan speed drops slowly over time then this 
may indicate that the fan is becoming clogged with dust and requires 
25 cleaning. An increase in the fan voltage which is not accompanied by a 
corresponding increase in the fan speed may indicate that the fan's 
bearings are starting to fail. With these improvements, it is sometimes 
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possible to detect emerging problems before the fan fails. Computers are 
increasingly provided with software that monitors these sensors while the 
computer is operating. It is possible to shut down the computer gracefiiUy 
to replace the fan instead of waiting for it to crash after the fan fails. If a 
5 graceful shutdown is achieved then the computer will be out of service for 
a shorter interval. 

[0007] Some computers are required to operate continuously for long 
periods, in so-called "24x7" operation. For example, a computer may 
10 process sales orders for an online shopping web site. If such a computer is 
shut dovm to replace a cooling fan, revenue may be lost in direct 
proportion to the length of time that the computer is out of service. It is 
highly desirable to avoid shutting down the computer altogether or at least 
to minimize the length of time that the computer is out of service. 

15 

[0008] As another example, modem high performance computing 
systems (i.e. supercomputers) typically consist of hundreds or thousands of 
interconnected rack-mounted computers. Such computer systems often 
run a computer intensive application for hours or days across all of the 
20 computers making up such a system. The appHcation runs a program on 
each of the computers. The programs communicate among themselves to 
share intermediate results. If one computer fails, the whole application will 
stop executing or fail. This may result in the loss of several hours or days 
worth of results. 



[0009] To satisfy the needs of 24x7 operation, high performance 
computing systems, and other situations with similar requirements, it is 



desirable to find a way to change a cooling fan without interrupting the 
operation of a computer and without risking destruction of the CPU due to 
excessive heat. 

5 Summary of the hivention 

[0010] One aspect of this invention provides a method for servicing a 
cooling system for an electronic device. The electronic device may be a 
CPU or a GPU in specific embodiments of the invention. The method 
comprises switching the electronic device fi-om a normal operating mode 

10 wherein the electronic device generates heat to a reduced heat generating 
mode wherein the electronic device generates heat at a reduced rate; 
continuing to operate the electronic device in the reduced heat generating 
mode while the cooling system is being serviced; and, subsequently 
switching the electronic device fi-om the reduced heat generating mode to 

15 the normal operating mode. 

[0011] A fiirther aspect of the invention provides a method for 
servicing apparatus which includes a cooling system for an electronic 
device. The method comprises operating the apparatus in a temperature 

20 control mode in which temperature rise in the electronic device is reduced; 
continuing to operate the apparatus in the temperature control mode while 
the cooling system is being serviced; and, subsequently switching the 
apparatus back to a normal operating mode. The temperature control mode 
is a mode in which the electronic device is operated in a reduced heat 

25 generating mode, supplementary active cooling is provided to the 
electronic device, or both. 
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[0012] Another aspect of the invention provides electronic apparatus 
comprising: a heat generating electronic device, which is a CPU or GPU 
in some embodiments of the invention; and a cooling system operational to 
cool the electronic device. In some embodiments of the invention the 
5 cooling system comprises a fan. The electronic apparatus comprises a 
maintenance procedure controller configured to switch the electronic 
device fi'om a normal operating mode wherein the electronic device 
generates heat to a reduced heat generating mode wherein the electronic 
device generates heat at a reduced rate upon detection of a signal 
10 indicating that the cooling system is about to be serviced and to switch the 
electronic device fi'om the reduced heat generating mode to the normal 
operating mode upon detection of a signal indicating that servicing of the 
cooling system has been completed. The maintenance procedure controller 
is a programmable data processor in some embodiments of the invention. 

15 

[0013] Further aspects of the invention and features of specific 
embodiments of the invention are described below. 

Brief Description of the Drawings 
20 [0014] In drawings which illustrate non-limiting embodiments of the 
invention. 

Figure 1 is a block diagram of a CPU cooling apparatus according 
to one embodiment of the invention; 

Figure 2 is a flow chart illustratiag a method for replacing a cooling 
25 system for a data processing chip without requiring the chip to be shut 
down completely; 
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Figure 3 is a block diagram illustrating an apparatus according to 
another embodiment of the invention; 

Figure 4 is a timing diagram illustrating a possible mode of 
operation of the apparatus of Figure 3; 
5 Figure 5 is a flow chart illustrating a method according to another 

embodiment of the invention; 

Figure 6 is a block diagram illustrating an apparatus according to a 
further embodiment of the invention; and. 

Figure 7 is a schematic view of a computer system according to an 
10 example embodiment of the invention. 



Description 

[0015] Throughout the following description, specific details are set 
forth in order to provide a more thorough understanding of the invention. 
15 However, the invention may be practiced without these particulars. In 

other instances, well known elements have not been shown or described in 
detail to avoid unnecessarily obscuring the invention. Accordingly, the 
specification and drawings are to be regarded in an illustrative, rather than 
a restrictive, sense. 

20 

[0016] This invention provides methods for repairing or replacing 
cooling systems for data processing chips which do not require the data 
processing chips to be shut down throughout the repair or replacement 
procedure. The methods involve temporarily shifting the data processing 
25 chips into a mode in which the data processing chips are still operating and 
yet generate less heat during a period while the cooling system is not 
operating. 
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[0017] The following description describes the application of the 
invention to cooling fans for CPUs. The invention may also be apphed to 
other types of data processing chips such as graphics processors and the 
5 like. The invention may further be applied in systems which include 
cooling systems other than, or in addition to, fans. 

[0018] Figure 1 shows a system 10 according to one implementation 
of the invention. System 10 includes a maintenance procedure controller 

10 20. Maintenance procedure controller 20 comprises logic circuits which 
are connected to control a clock speed at which a CPU 50 operates. CPU 
50 is cooled by a cooling system which includes a heat sink 52 and a fan 
54. In the illustrated embodiment, maintenance procedure controller 20 
communicates signals 110, 120 to a clock controller 30. Clock controller 

15 30, in tum, generates a signal 140 which controls the clock frequency of a 
clock signal 150 generated by a clock generator 40. 

[0019] Maintenance procedure controller 20 receives signals which 
indicate that a fan replacement procedure is commencing, or will 

20 imminently conmience. In the illustrated embodiment, maintenance 

procedure controller 20 is connected to receive a Start Fan Replacement 
Procedure command 60. Maintenance personnel may cause command 60 
to be issued through a user interface (e.g. textual command, GUI) or via a 
manual control (e.g. a button). Command 60 may, for example, originate at 

25 a console (not shown) which includes mechanisms for the overall 

administration of a system which includes processor 50. The system may 
include many other processors. For example, the system may be a 
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multiprocessor computer system having, for example, several hundred 
CPUs. 

[0020] Upon receiving command 60, maintenance procedure 
5 controller 20 commences performing a method 200 for permitting 

replacement of fan 54. Method 200 is shown in Figure 2. Upon receiving 
signal 60, maintenance procedure controller 20 enters a mode in which it 
waits for an About to Replace Fan signal 70 from maintenance personnel. 
Signal 70 may be provided via a button or control panel on the computer in 

10 which processor 50 is located. Upon block 220 determining that a signal 
70 has been received, maintenance procedure controller 20 sends a 
Decrease Clock Frequency signal 110 to clock controller 30 (block 230). 
In response to signal 110, clock controller 30 reduces the frequency 
indicated by Desired Clock Frequency signal 140. In response to signal 

15 140, clock generator 40 reduces the frequency of the clock signal 150 that 
it applies to CPU 50. CPU 50 generates less heat when the frequency of 
clock signal 150 is reduced. The reduced heat generation at least slows the 
rate at which the die temperature of CPU 50 increases. 

20 [0021] While CPU 50 is in a reduced heat generation mode, service 
personnel can remove and replace fan 54 without the die temperature of 
CPU 50 rising so much that CPU 50 becomes damaged. 

[0022] While fan 54 is being replaced, CPU 50 optionally provides a 
25 signal indicating the current CPU temperature 130 to maintenance 

procedure controller 20 (block 250). Maintenance procedure controller 20 
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indicates the current CPU temperature to maintenance personnel as CPU 
temperature indication 80 (block 270). Indication 80 may be audible, 
visual (either textual or graphical) or the like. For example, maintenance 
procedure controller 20 may display the CPU temperature in a user 
5 interface on a control panel (not shown) of the computer. The display may 
be provided by way of any suitable technology. For example, the display 
may include any of: LCD display panels, LED or LCD displays, GUIs, 
and the like. The display may be located in any suitable location. In some 
embodiments, the display is located in a position where it is visible to a 
10 technician who is viewing CPU 50 through an opening in a case within 
which the cooling system for CPU 50 is housed. 

[0023] The displayed temperature may be continuously updated to 
show the slow rise in temperature that occurs without the cooling air flow 

15 provided by the cooling fan. In the altemative or in addition, maintenance 
procedure controller 20 may generate warning signals if certain 
temperature thresholds are exceeded. Maintenance procedure controller 
may monitor the current temperature of CPU 50 and a rate at which the 
temperature of CPU 50 is increasing and may calculate and display an 

20 estimated amount of time remaining before a temperature threshold is 
reached. The estimated amount of time remaining may be used by 
maintenance personnel to determine whether the fan replacement is 
proceeding quickly enough to be completed before the temperature of 
CPU 50 rises to an unacceptable level. 

25 

[0024] After maintenance personnel have replaced fan 54, a Finished 
Replacing Fan signal 90 is provided to maintenance procedure controller 
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20, Signal 90 may be provided by operating a button or control panel on 
the computer. In response to receiving signal 90, (as determined at block 
290) maintenance procedure controller 20 sends an Increase Clock 
Frequency signal 120 (block 295) to clock controller 30. Clock controller 
5 30 responds by sending a larger Desired Clock Frequency signal 140 to 
clock generator 40. Clock generator 40 increases the frequency of the 
clock signal 150 that it applies to CPU 50. Once CPU 50 starts operating 
at the higher clock frequency 150, it generates additional heat. 
Maintenance procedure controller 20 terminates the fan replacement 
10 procedure and optionally issues a Fan Replacement Procedure Completed 
signal 100. In some embodiments, signal 100 is provided to a control 
console remote from CPU 50. 

[0025] While CPU 50 is being run in the reduced heat generation 
15 mode, the frequency of clock signal 150 is reduced to a low, but non-zero 
level. As a result, CPU 50 continues to execute software instructions 
during the procedure. In some embodiments of the invention, the 
frequency of the clock signal is reduced to 15% or less, and preferably to 
10% or less of its normal value (i.e. the clock frequency is reduced by 
20 85%, and preferably by 90% in switching from the normal operating mode 
to the reduced heat generating mode). For example, a normal 2.0 GHz 
clock signal applied to CPU 50 might be reduced to 100 MHz (5% of its 
normal value), or less while CPU 50 is being run in the reduced heat 
generation mode. For another example, in the normal operating mode the 
25 clock frequency may be in excess of 1 .5 GHz and in the reduced heat 
generating mode the clock frequency may be less than 250 MHz. 
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[0026] While cooling fan 54 is removed, and CPU 50 is running in 
the reduced heat generation mode, the temperature of CPU 50 may 
continue to rise. Therefore, if the cooling fan is not replaced and put back 
into operation soon enough even at the reduced clock frequency the 
5 temperature of CPU 50 may rise to an unacceptable level. Most modem 
CPUs are equipped with thermal protection and will shut down if safe 
operating temperatures are exceeded. Where CPU 50 includes such 
thermal protection, if the maintenance personnel do not replace the cooling 
fan soon enough, CPU 50 will be shut down before it is damaged. 

10 

[0027] Maintenance procedure controller 20 may optionally be 
capable of causing CPU 50 to shut down. Maintenance procedure 
controller 20 may monitor a current CPU temperature 130. If CPU 
temperature 130 exceeds a threshold, then maintenance procedure 
15 controller 20 could send a signal to cause CPU 50 to be shut down. This 
functionality may be used to particular advantage in cases where CPU 50 
does not have built-in over-temperature protection. 

[0028] In the apparatus shown in Figure 1, clock controller 30 and 
20 clock generator 40 are shown as being separate from CPU 50. These 
components could be combined in any suitable combination. By way of 
example only, clock controller 30 and clock generator 40 could be 
integrated with one another; one or both of clock controller 30 and clock 
generator 40 could be integrated with CPU 50. 

25 

[0029] Figure 3 shows a system 400 according to an alternative 
implementation of the invention. Maintenance procedure controller 20' 



. 12- 



interacts with maintenance personnel as described above. However, 
instead of controlling a frequency of clock signal 150 by interacting with 
clock generator 40, maintenance procedure controller 20* issues a stream 
of HALT 430 and RESUME 432 commands to CPU 50. Commands 430 
5 and 432 may comprise any suitable signals provided to CPU 50. For 
example, issuing a sequence of conmiands 430 and 432 may comprise 
toggling logic signals appUed to a halt pin on CPU 50. HALT commands 
430 disable CPU 50 or otherwise place CPU 50 in an idle state in which 
heat generation is significantly reduced. RESUME commands 432 enable 
10 CPU 50. The rate at which CPU 50 generates heat can be controlled by 
varying the relative lengths of time during which CPU 50 is enabled and 
disabled. In system 400, the frequency of clock signal 150 does not need 
to be adjusted during the fan replacement procedure. 

1 5 [0030] As seen in Figure 4, the periodic HALT430 and RESUME 
432 conmiands impose a duty cycle on clock signal 150. The result is that 
CPU 50 experiences an effective clock signal 502. In the example of 
Figure 4, CPU 50 is only enabled for one out of every four pulses of clock 
signal 150 (i.e. effective clock signal 502 has a 25% duty cycle - 3 out of 4 

20 clock pulses have been removed leaving 1 out of 4 clock pulses). In this 
example, the stream of HALT and RESUME commands cause CPU 50 to 
run at 25% of its regular speed. Heat output is reduced. By varying the 
periodicity of the alternating HALT and RESUME commands, duty cycles 
of less than or greater than 25% can be acliieved. In some embodiments of 

25 the invention, running CPU 50 in the reduced heat generation mode 

comprises applying HALT and RESUME commands such that the CPU 
operates at a duty cycle of 25% or less. 
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[0031] Returning to Figure 3, during the fan replacement, 
temperature 130 may be monitored and displayed to the maintenance 
personnel, as described above. When maintenance procedure controller 
5 20' receives a Finished Replacing Fan signal 90 from the maintenance 
personnel, maintenance procedure controller 20* retums CPU 50 to a full 
duty cycle clock signal (for example, by issuing a RESUME command and 
then ceasing issuing the stream of HALT and RESUME commands). The 
fan replacement procedure subsequently terminates. 

10 

[0032] The duty cycle of microprocessor 50 may be varied in other 
manners than by issuing HALT and RESUME commands. Existing 
microprocessors (e.g. Intel Pentium IV'*''^ and AMD Opteron"^^) have 
built-in mechanisms for changing the duty cycle in increments of 12.5% as 

1 5 part of their support for the Advanced Configuration and Power Interface 
(ACPI) management standard. Periodically halting CPU 50 can provide 
finer control over the duty cycle of CPU 50 than is possible by using 
current ACPI fimctionality. In some embodiments of the invention, both 
built-in mechanisms, for example ACPI, and extemal mechanisms, for 

20 example toggling a signal applied to a HALT pin, are used in combination 
to achieve the reduced heat generating mode. 

[0033] Figure 5 is a flowchart illustrating a method 500 which may 
be performed by maintenance procedure controller 20' in system 400 of 
25 Figure 3. Method 500 starts upon receipt of a Start Fan Replacement 
Procedure signal 60. Method 500 loops at block 510 until an About to 
Replace Fan signal 70 is received from the maintenance personnel. After 
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signal 70 is received, a sequence of alternating HALT and RESUME 
commands 430 and 432 are generated in loop 512. 

[0034] In block 520 a HALT command 430 is sent to CPU 50. 
5 Method 500 then waits in block 524 for an interval before sending a 
RESUME command 432 to CPU 50 in block 526. If block 528 determines 
that a Finished Replacing Fan signal 90 has been received from the 
maintenance personnel, then method 500 optionally sends a Fan 
Replacement Procedm'e Completed signal 100 and terminates. The CPU is 
10 running at full duty cycle as a result of the Resume command 432 issued in 
the most recent iteration of block 526. 

[0035] If block 528 determines that signal 90 has not been received, 
method 500 proceeds to block 530 where a current CPU temperature 130 
15 of CPU 50 is monitored. In block 532 method 500 displays the current 
CPU temperature. In block 534 method 500 waits for a period T^n before 
continuing to block 520. Neglecting the time taken to execute blocks other 
than blocks 524 and 534, method 500 provides a duty cycle of 
approximately T^„/(Toff+T^„). 

20 

[0036] Figure 6 is a block diagram of apparatus 600 which is a 
variation of apparatus 400 of Figure 3. In apparatus 600 maintenance 
procedure controller 20" does not directly send Halt and Resume 
commands 430, 432 to CPU 50. Instead, maintenance procedure controller 
25 20" sends HALT and RESUME commands 430, 432 to a support system 
612. Support system 612 is typically provided in an integrated circuit. 
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Support system 612 issues HALT and RESUME commands 430A and 
432A respectively to CPU 50 in response to receiving HALT and 
RESUME conmiands 430, 432 from maintenance procedure controller 
20". Support system 612 may comprise a support chip (e.g. north bridge, 
5 south bridge, I/O hub, etc.). Support system 612 may implement the ACPI 
management standard. 

[0037] In some embodiments of the invention a computer system 
which houses CPU 50 or a computer system which is physically near to 

10 CPU 50 includes a software configurable control. Upon the receipt of Start 
Fan Replacement Procedure signal 60 the control is placed in a first mode 
such that a first activation of the control causes About to Replace Fan 
signal 70 to be generated. The first activation of the control directly or 
indirectly places the control in a second mode. When the control is in the 

15 second mode, activation of the control causes Finished Replacing Fan 
signal 90 to be generated. 

[0038] In apparatus according to other embodiments of the invention 
About to Replace Fan signal 70 and/or Finished Replacing Fan signal 90 

20 are generated automatically in response to monitoring parameters relating 
to the fan. For example, upon the receipt of Start Fan Replacement 
Procedure signal 60, a maintenance procedure controller may monitor a 
current draw of the fan. If the fan current draw suddenly drops to zero (as 
would occur if a technician disconnected the fan from its power source in 

25 preparation for removing the fan) the maintenance procedure controller 
automatically generates About to Replace Fan signal 70 (for example, by 
interpreting the current drop as About to Replace Fan signal 70 or by 



- 16- 



causing a separate signal to be generated). When the fan current draw 
returns to a typical value (as would occur when the technician connects a 
replacement fan) - or when the fan current draw returns to a typical value 
and the CPU temperature begins to level off or drop - the maintenance 
5 procedure controller automatically generates Finished Replacing Fan 
signal 90. The portion of the maintenance procedure controller which 
monitors fan current draw may be physically separated from other parts of 
the maintenance procedure controller. 

10 [0039] In apparatus according to other embodiments of the invention 
the About to Replace Fan signal 70 may be generated automatically in 
response to the opening of a service panel. For example, opening a service 
panel to access a cooling system for CPU 50 (e.g. fan 54) may change the 
state of a microswitch which causes About to Replace Fan signal 70 to be 

1 5 generated. 

[0040] In any of the implementations of the invention described 
above, the maintenance procedure controller may comprise: a suitably 
programmed data processor; hardware logic circuits, which may be 

20 provided in the form of an FPGA (Field Programmable Gate Array), ASIC 
(Application Specific Integrated Circuit), etc.; or some combination 
thereof. In some embodiments of the invention the functions of the 
maintenance procedure controller are provided by hardware, or hardware 
and software resident within a single integrated circuit. Where CPU 50 is 

25 part of a multi-processor computer system, the functions of the 

maintenance procedure controller may be provided by causing one of the 
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other processors in the multi-processor computer system to act as the 
maintenance procedure controller. 

[0041] As an example implementation of the invention, consider a 
5 multi-processor computer system 700 as shown in Figure 7. System 700 
has hundreds of CPUs 50 each cooled by a fan or other cooling system. 
CPUs 50 are distributed among a number of chassis 704 which are 
interconnected by a data communication network 706. Each chassis 704 
may house one or several CPUs 50. Computer system 700 has a control 
10 console 708 which can communicate with each of the chassis. 

Maintenance staff decide that the cooling system of one CPU 50A requires 
replacement. A person at console 708 causes the console to issue a Start 
Fan Replacement Procedure to a maintenance procedure controller 20A 
associated with CPU 50 A. 

15 

[0042] Maintenance procedure controller 20A is connected to detect 
a signal which results when maintenance personnel activate a control 710 
associated with the chassis 704A in which CPU 50A is housed. In this 
example, control 710 is a pushbutton on chassis 704A. In response to the 
20 Start Fan Replacement Procedure signal, maintenance procedure controller 
20A configures itself to interpret the signal resulting from the actuation of 
control 710 as an About to Replace Fan signal. 

[0043] A technician proceeds to chassis 704 A. The technician may 
25 access CPU 50A through a service panel 709 or other suitable opening. 
When the technician is ready to replace the cooling system for CPU 50A, 
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the technician actuates control 710. Maintenance procedure controller 20A 
then causes CPU 50A to operate in a reduced heat generating mode and 
configures itself to recognize the next actuation of control 710 as a 
Finished Replacing Fan signal. The technician replaces or otherwise 
5 services the cooling system for CPU 50 A. While the technician is 
servicing the cooling system for CPU 50A, maintenance procedure 
controller 20A causes the current temperature of CPU 50A and the 
estimated time remaining before the cooling system must be placed back in 
service or the CPU 50 A shut dovm on a display 712 located where the 
10 technician can see it. 



[0044] When the technician completes servicing the cooling system 
for CPU 50 A, the technician actuates control 710 again. This causes 
maintenance procedure controller 20A to place CPU 50A in its normal 
15 operating mode and to send a Fan Replacement Completed signal back to 
console 708 where it can be logged. 

[0045] Commands and other signals may be implemented in any 
suitable manner including by way of technologies such as: 
20 • analog or digital electrical signals; 

• packet-based message protocols; 

• optical signals; 

• signals carried on a wireless data communication medium; 

• combinations of the above; 
25 • and the like. 
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[0046] Certain implementations of the invention comprise computer 
processors which execute software instructions which cause the processors 
to perform a method of the invention. For example, the maintenance 
procedure controllers in any of the embodiments described herein may 
5 comprise one or more processors executing software commands which 
cause the processors to implement methods of the invention such as, for 
example, the methods of Figures 2 or 5. The invention may also be 
provided in the form of a program product. The program product may 
comprise any medium which carries a set of computer-readable signals 

10 comprising instructions which, when executed by a data processor, cause 
the data processor to execute a method of the invention. Program products 
according to the invention may be in any of a wide variety of forms. The 
program products may comprise, for example, physical media such as 
magnetic data storage media including floppy diskettes, hard disk drives, 

15 optical data storage media including CD ROMs, DVDs, electronic data 
storage media including ROMs, flash RAM, or the like or 
transmission-type media such as digital or analog communication links. 

[0047] Where a component (e.g. a software module, processor, 
20 assembly, device, circuit, etc.) is referred to above, unless otherwise 

indicated, reference to that component (including a reference to a "means") 
should be interpreted as including as equivalents of that component any 
component which performs the ftmction of the described component (i.e., 
that is ftmctionally equivalent), including components which are not 
25 structurally equivalent to the disclosed structure which performs the 
ftmction in the illustrated exemplary embodiments of the invention. 
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[0048] As will be apparent to those skilled in the art in light of the 
foregoing disclosure, many alterations and modifications are possible in 
the practice of this invention without departing fi-om the spirit or scope 
thereof For example: 
5 • Some of the embodiments described above place CPU 50 into a 
reduced heat generating mode by reducing the fi-equency of clock 
signal 150, in other embodiments, the same end is achieved by 
reducing the duty cycle of CPU 50. In other embodiments of the 
invention, placing CPU 50 into the reduced heat generating mode 
10 involves both reducing the fi-equency of clock signal 150 and 

reducing a duty cycle of CPU 50. 
• The reduced heat generating mode need not be characterized by a 
constant clock fi-equency and/or duty cycle. In some embodiments 
of the invention the clock fi-equency and/or duty cycle are varied 
15 when CPU 50 is in the reduced heat generating mode. In some such 

embodiments, the clock fi'equency and/or duty cycle are varied in 
response to the CPU temperature so as to maintain the rate at which 
the CPU temperature rises below a threshold or so as to provide at 
least a predetermined amount of time before the CPU temperature 
20 rises to some threshold value. In some embodiments, the clock 

fi*equency and/or duty cycle are varied so as to control the 
temperature of CPU 50 to increase at about, but not more than, a 
maximum desired rate. The maximum desired rate is selected to 
provide sufficient time for servicing the cooling system. Controlling 
25 the CPU to allow its temperature to increase at about the maximum 

desired rate avoids reducing performance of CPU 50 by an 
unnecessarily large amount. The maximum desired rate may be 
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configurable. If the maximum desired rate is configurable, a slow 
technician, or a technician who has a complicated service operation 
to perform may select a lower maximum desired rate than a faster 
technician, or a technician who has to perform a very simple service 
operation which can be completed quickly. 
The reduced heat generating mode may be achieved in manners 
other than as described above. For example, heat-generating 
subsystems within CPU 50 (or another electronic device to which 
the invention is being applied) may be halted, disabled, or otherwise 
caused to generate reduced heat in the reduced heat generating 
mode. In some embodiments of the invention CPU 50 includes a 
cache memory and placing CPU 50 into the reduced heat generating 
mode comprises either disabling the cache memory or operating the 
cache memory at a reduced fi-equency. 

Service personnel may use any suitable mechanisms to generate 
About to Replace Fan signal 70 and Finished Replacing Fan signal 
90. 

While the invention has been discussed in terms of decreasing the 
heat output by a CPU while a cooling fan is being replaced, the 
invention is equally applicable to any similar computer system 
component that generates significant heat. For example, the 
invention could be applied to a graphics processing unit (GPU) on a 
video card. 

There need not be a 1 : 1 relationship between CPUs 50 and 
maintenance procedure controllers 20 (or 20' or 20''). A single 
maintenance procedure controller 20 (or 20' or 20") may be 
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provided to permit maintenance of the cooling systems of several 
CPUs. 

The methods of the invention may comprise operating an auxiliary 
active cooling system to provide supplementary cooling to CPU 50 
(or another electronic device to which the invention is being 
applied) while the cooling system associated with CPU 50 is being 
serviced. The auxihary cooling system may comprise a cooling 
system which normally cools some other device, such as an adjacent 
CPU 50. For example, operating the auxiliary cooling system to 
provide some cooling to CPU 50 may comprise operating a fan 
which normally cools a nearby CPU 50, or a fan which normally 
operates to ventilate a housing within which a heat sink associated 
with CPU 50 is located at a higher than normal speed so as to cause 
some cooling airflow past CPU 50. 

Some general methods according to the invention are for servicing a 
cooling system associated with one or more electronic devices in an 
apparatus. Such general methods comprise servicing the cooling 
system associated with the one or more electronic devices, for 
example by replacing the cooling system or a component thereof. 
While the servicing is depriving the one or more electronic devices 
of their normal cooling, the methods operate the apparatus in a 
temperature control mode which reduces temperature rise in the one 
or more electronic devices. The one or more electronic devices 
continue to operate. In such embodiments of the invention the 
temperature control mode may comprise operating the one or more 
electronic devices in a reduced heat generating mode, for example, 
in any manner described herein, and/or providing supplementary 
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active cooling to the one or more electronic devices, for example by 
operating a cooling system in the apparatus at a higher than normal 
output, while the cooling system associated with the one or more 
electronic devices is serviced. 
5 Accordingly, the scope of the invention is to be construed in accordance 
with the substance defined by the following claims. 



